Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't always send update metadata requests to the same broker #395

Merged

Conversation

signine
Copy link
Contributor

@signine signine commented Jan 21, 2020

We have been trying to use KafkEx with an app that generates high produce rates.
At first we tried to use one KafkaEx worker (the default one) but found that a few moments after starting the app, produce requests start to timeout. One worker wasn't able to keep up with the rate of produce requests coming in, so its mailbox started to fill up.

So next we tried to use a pool of workers - one per topic and partition like Brod. The app was stable now but we noticed something odd with the brokers. One of the brokers always had significantly higher system load and network traffic (bytes out) than the other brokers. After investigating it was found that the extra load was coming from the periodic metadata update requests made by all the workers.

For requests like fetching metadata and api_versions, KafkaEx will iterate through every broker that it knows about until it gets a successful response. It will normally try the brokers in the same sequence every time but the first one usually succeeds, so this first broker in the list gets an uneven amount of load.

In this PR we randomize the broker list before sending any requests in order to spread the load of update metadata requests evenly across all brokers.

Testing:
All tests passed locally.
I manually tested the behaviour by logging the broker list in first_broker_response()

…date requests don't always go to the same broker
@sourcelevel-bot
Copy link

Hello, @shamilish! This is your first Pull Request that will be reviewed by SourceLevel, an automatic Code Review service. It will leave comments on this diff with potential issues and style violations found in the code as you push new commits. You can also see all the issues found on this Pull Request on its review page. Please check our documentation for more information.

@jbruggem
Copy link
Collaborator

Re-started the failing test and everything passes. The change looks sound, but I don't know enough about this part to make sure that it's the logical thing to do :). It's a tiny change, so I have no doubt another maintainer will pick it up very quickly !

Copy link
Member

@bjhaid bjhaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@joshuawscott joshuawscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a lot of sense, thanks for this!

@joshuawscott joshuawscott merged commit c1f94e3 into kafkaex:master Jan 21, 2020
@joshuawscott joshuawscott mentioned this pull request Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants